This notebook supports an asynchronous lecture meant to introduce you to some basics of spatial data. The lecture will be provided via Echo360.

I will provide you with a partially completed notebook to assist your work.

This notebooks will focus on “spatial data” and also accessing demographic data in R.

Key topics for today:

Packages

Standards:

library(knitr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate) # because we will probably see some dates
library(here) # more easily access files in your project
## here() starts at C:/Users/lucas/OneDrive/Desktop/bikeshare_2023

Some additional packages focuses on today’s work:

library(sf) # working with simple features - geospatial
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
library(tmap)
## Warning: package 'tmap' was built under R version 4.3.2
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
library(tidycensus)
## Warning: package 'tidycensus' was built under R version 4.3.2

A link to a book on tmap: https://r-tmap.github.io/

Using the Neighborhood Geospatial Data (using /data)

I will use the GeoJSON file. (Newer, not necessarily better, but … a single file. Not smaller, but … this one is not big.)

https://opendata.dc.gov/datasets/DCGIS::dc-health-planning-neighborhoods/about

Using the Neighborhood Geospatial Data (using /data)

Load the neighborhood geospatial data as neigh.

neigh = st_read(here("data_raw", "DC_Health_Planning_Neighborhoods.geojson")) %>% clean_names()
## Reading layer `DC_Health_Planning_Neighborhoods' from data source 
##   `C:\Users\lucas\OneDrive\Desktop\bikeshare_2023\data_raw\DC_Health_Planning_Neighborhoods.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 51 features and 8 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99556
## Geodetic CRS:  WGS 84

Ensure it plots.

plot(neigh)

Investigating joining spatial and non-spatial data

Download the DC covid datase for positive cases and store at an appropriate place in your project.

Read the data as df_c and be sure to clean names.

df_c = read_csv(here("data_raw", "DC_COVID-19_Total_Positive_Cases_by_Neighborhood.csv")) %>% clean_names()
## Rows: 26372 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): DATE_REPORTED, NEIGHBORHOOD
## dbl (2): OBJECTID, TOTAL_POSITIVES
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now - let’s focus on a particular date (for no reason other than simplifying our analysis).

df_cases = df_c %>%
  filter(as_date(date_reported) == "2021-11-17") %>%
  separate(neighborhood, into=c("code", "name"), sep = ":")

Create the dataframe df_cases:

df_cases=df_c %>%
  filter(as_date(date_reported) == "2021-11-17") %>% 
  separate(neighborhood,into=c("code","name"),sep = ":") %>%
  mutate(code=case_when(code=="N35" ~"N0",
                        TRUE ~ code)) %>%
  select(-date_reported)

Regular joining (of dataframes)

Join the dataframes and make a chloropleth map using tmap.

neigh2 = left_join(neigh, df_cases, by = c("code"))

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(neigh2) + tm_polygons("total_positives", alpha = 0.5)

Joining with other spatial data

Let’s get some data using tidycensus. Need an API key https://api.census.gov/data/key_signup.html

census_api_key("fd79001f8e491bb60620a3e5dfeb02656d0af22e")
## To install your API key for use in future sessions, run this function with `install = TRUE`.

What data is available — and what is the variable name?

(We are interested in the 5year American Community Survey.)

#what variables
v20 = load_variables(2018, "acs5")
#Variables include "Sex by Age" (B01001 variables), Race (B02001), and "People Recording Single Ancestry" (B04004), amongst others.

Get some data:

df_census=get_acs(geography = "tract",
                  variables=c("median_inc"="B06011_001",
                              "pop"="B01001_001",
                              "pop_black"="B02009_001"),
                  state="DC",geometry=TRUE,year=2021) 
## Getting data from the 2017-2021 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |======================================================================| 100%

Make a plot to verify that you read the data:

plot(df_census)

A BETTER VISUALIZATION

It’s in long format. Let’s make it wide.

df_cens=df_census %>% 
  select(-moe) %>% 
  pivot_wider(names_from = "variable", 
              values_from = "estimate")
  
 

tm_shape(df_cens) +tm_polygons("median_inc",alpha=.5)

How to join

Consider this problem:

  tm_shape(neigh2) +tm_borders(col="blue",lwd=5,alpha=.2)+
  tm_shape(df_cens) +tm_borders(col="red",lwd=1,alpha=.3)

OK - follow the challenging code elements:

You need to add a coordinate system to the census data:

df_cens_adj = df_cens %>% st_transform(4326)

But which way do we join — and — think about how it should “aggregate” the data.

df_j = st_join(df_cens_adj, neigh2, largest = TRUE)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
df_j_rev = st_join(neigh2, df_cens_adj, largest = TRUE)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

Since we want the geometry for the NEIGHBORHOODS, we need a different work a little harder:

df1=df_j %>% select(median_inc,pop,pop_black,code) %>%
  group_by(code) %>%
  summarise(pop_n=sum(pop),
            pop_black_n=sum(pop_black), 
            adj_median_income=sum(pop*median_inc)/pop_n) 

plot(df1)

Now that we are aggregating in the right way, we can join.

#df2=left_join(neigh2,df1)

df2=left_join(neigh2,df1 %>% st_set_geometry(NULL))
## Joining with `by = join_by(code)`

And visualize:

df2=df2 %>% mutate(black_perc=pop_black_n/pop_n, covid_rate=total_positives/pop_n)
tm_shape(df2)+tm_polygons(c("adj_median_income","covid_rate","black_perc"))

Improve that visualization:

df2 %>% filter(code!="N0") %>%
  tm_shape() + tm_polygons(c("adj_median_income", "covid_rate", "black_perc"), alpha = .4)